Traffic is light and it's still 9 million flows a day: Aggregation and Reporting.
This page is intended as a discussion about how to leverage Netflow tools in monitoring, managing, and planning growth in your network. Anatomy of a flow A 'flow' is a data record containing the start and end times of a single network connection: source and destination address/port, number of packets, number of bytes, and optionally source and destination AS. Depending on flow version, other information may be included. To provide an example, take the action of opening a web page and break it down. The first phase is connecting to the web server to fetch the page you're after. That's a single flow by itself. Back 'in the old days', each item on a page that had to be retrieved (each image, for example) was an additional connection. The advent of 'pipelining' in web browsers simplified this using the original connection to reduce the number of connections to a web server. If the page contains links to content on other servers, each of those is a separate connection. Pulling up cnn.com's home page may result in as many as 6 to 10 separate network flows. Collection Whether your chosen network platform, homogenous or otherwise, is capable of netflow export is a minor issue. Even without explicit support, it's still possible to make use of this tool in creative ways. Exporting flows from your networking hardware Cisco and Juniper notably support Netflow export from their platforms. In most cases, it's a fairly simple and well documented exercise to enable export. Be judicious in this respect. Don't jump into a core device and just flip the switch. Take the time to gauge your network: packets per second, number of talking hosts, average throughput per minute. Find a small segment that you feel provides a representative sample of typical traffic on your network, and focus on that for a few days, if not a week, to get a good idea of your collection and storage requirements. Pay close attention to memory and CPU utilization on your chosen export device, as this may be the key factor in the scope and method of your deployment. Tapping your network In the event that you've got router hardware that just doesn't support netflow export, you can tap your network links in a couple of ways. The first is a logical tap, implemented at the switch level to mirror all traffic on an egress port to another port on the switch. Budget constraints in mind, you could also use a dumb hub for this. The second is a physical tap. There's an excellent write up on both the science and the application of physical ethernet taps in IDS/Honeynet environments here. You can utilize the same methods to give you access to egress links if the method fits your needs. Once you've got a tap or mirror of the link you want to analyze, a tool like softflowd will give you a very inexpensive way to start extracting flow data from any link. For networks on a budget, this a great way to leverage the rich data set found in netflow exports. Obviously, it's a great way to get Snort or similiar IDS tools on the wire without disrupting normal operations. Storage Some people consider 9 million rows in a database to be of good size. They've obviously never had to store raw flows for any period of time. Just to provide an example, for the month of February my office generated 125 million flows, using about 5.8 gigs of space. With a three to six month retention policy.. You do the math. Then think about your Security team wanting to use it effectively in forensic analysis, or your backbone planning teams running reports against it to apply QoS or install new circuits. Effective Aggregation Holy war, Batman! Put five network management engineers in a room and ask them the best way to do this. It's funnier than asking an equal amount of Unix nerds which editor is best. Depending on your role in both your organization and the internetwork as a whole, your reporting requirements will invariably differ and have unique quirks. IP addresses very naturally fall into two categories: Subnet and Source AS. Each flow also carries information about the protocols involved (http, ssh, snmp, etc). These specific data points provide good generic handles to determining how you want to roll up your data into useful summaries. Coupled with hour, day, and month factors, it's pretty easy to find an aggregation method that fits your reporting needs.